NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Critical Assessment of Large Language Models’ (ChatGPT) Performance in Data Extraction for Systematic Reviews: Exploratory Study

https://doi.org/10.2196/68097

Mahmoudi, Hesam; Chang, Doris; Lee, Hannah; Ghaffarzadegan, Navid; Jalali, Mohammad S (September 2025, JMIR AI)

Abstract BackgroundSystematic literature reviews (SLRs) are foundational for synthesizing evidence across diverse fields and are especially important in guiding research and practice in health and biomedical sciences. However, they are labor intensive due to manual data extraction from multiple studies. As large language models (LLMs) gain attention for their potential to automate research tasks and extract basic information, understanding their ability to accurately extract explicit data from academic papers is critical for advancing SLRs. ObjectiveOur study aimed to explore the capability of LLMs to extract both explicitly outlined study characteristics and deeper, more contextual information requiring nuanced evaluations, using ChatGPT (GPT-4). MethodsWe screened the full text of a sample of COVID-19 modeling studies and analyzed three basic measures of study settings (ie, analysis location, modeling approach, and analyzed interventions) and three complex measures of behavioral components in models (ie, mobility, risk perception, and compliance). To extract data on these measures, two researchers independently extracted 60 data elements using manual coding and compared them with the responses from ChatGPT to 420 queries spanning 7 iterations. ResultsChatGPT’s accuracy improved as prompts were refined, showing improvements of 33% and 23% between the initial and final iterations for extracting study settings and behavioral components, respectively. In the initial prompts, 26 (43.3%) of 60 ChatGPT responses were correct. However, in the final iteration, ChatGPT extracted 43 (71.7%) of the 60 data elements, showing better performance in extracting explicitly stated study settings (28/30, 93.3%) than in extracting subjective behavioral components (15/30, 50%). Nonetheless, the varying accuracy across measures highlighted its limitations. ConclusionsOur findings underscore LLMs’ utility in extracting basic as well as explicit data in SLRs by using effective prompts. However, the results reveal significant limitations in handling nuanced, subjective criteria, emphasizing the necessity for human oversight.
more » « less
Free, publicly-accessible full text available September 1, 2026
A multi‐dimensional index of evaluating systems thinking skills from textual data

https://doi.org/10.1002/sres.3033

Liu, Ning‐Yuan Georgia; Mahmoudi, Hesam; Triantis, Konstantinos; Ghaffarzadegan, Navid (June 2024, Systems Research and Behavioral Science)

Abstract Systems thinking (ST) includes a set of critical skills and approaches for addressing today's complex societal problems. Therefore, it has been introduced into the curricula of many educational programmes around the world. Despite all the attention to ST, there is less consensus when it comes to evaluating and assessing ST skills. Particularly, a quantitative assessment approach that captures ST's multi‐dimensionality is crucial when evaluating the degree to which one has learned and utilizes ST. This paper proposes a systematic approach to create such a multi‐dimensional Index of ST from textual data. Initially, we provide an overview of the theoretical background as it pertains to different measurement approaches of ST skills. Then we provide a conceptual framework based on ST skill measures and transform this framework into a quantifiable model. Finally, using student data, we provide an illustration of an integrated index of ST skills. We compute this index by using a mixed methods approach, including robust principal component analysis, data envelopment analysis and two‐staged bootstrapping approach. The results show that (i) our model serves as a systematic multi‐dimensional ST approach by including multiple measures of ST skills and (ii) international students and self‐reported math skills are found as significant predictors of one's level of ST in the graduate student dataset (N = 30), however no significant factors are found in the first‐year engineering student dataset (N = 144).
more » « less
Full Text Available
Comparing Self-Report Assessments and Scenario-Based Assessments of Systems Thinking Competence

https://doi.org/10.1007/s10956-023-10027-2

Davis, Kirsten A.; Grote, Dustin; Mahmoudi, Hesam; Perry, Logan; Ghaffarzadegan, Navid; Grohs, Jacob; Hosseinichimeh, Niyousha; Knight, David B.; Triantis, Konstantinos (March 2023, Journal of Science Education and Technology)

Abstract Self-report assessments are used frequently in higher education to assess a variety of constructs, including attitudes, opinions, knowledge, and competence. Systems thinking is an example of one competence often measured using self-report assessments where individuals answer several questions about their perceptions of their own skills, habits, or daily decisions. In this study, we define systems thinking as the ability to see the world as a complex interconnected system where different parts can influence each other, and the interrelationships determine system outcomes. An alternative, less-common, assessment approach is to measure skills directly by providing a scenario about an unstructured problem and evaluating respondents’ judgment or analysis of the scenario (scenario-based assessment). This study explored the relationships between engineering students’ performance on self-report assessments and scenario-based assessments of systems thinking, finding that there were no significant relationships between the two assessment techniques. These results suggest that there may be limitations to using self-report assessments as a method to assess systems thinking and other competencies in educational research and evaluation, which could be addressed by incorporating alternative formats for assessing competence. Future work should explore these findings further and support the development of alternative assessment approaches.
more » « less
Mental models, cognitive maps, and the challenge of quantitative analysis of their network representations

https://doi.org/10.1002/sdr.1729

Haque, Sumaiya; Mahmoudi, Hesam; Ghaffarzadegan, Navid; Triantis, Konstantinos (February 2023, System Dynamics Review)

Abstract Cognitive maps, or mental maps, are externalized portrayals of mental models—people's mental representations of reality and their presumptions about how the world works. They are often used as the intermediary step toward uncovering individuals' presumptions of the outside world. Yet, the next step is often vague: once one's understanding of the real world is mapped, how can we systematically evaluate the maps and compare and contrast them? In this note, we review several common approaches to analyzing cognitive maps, some rooted in network theories, and apply them to a dataset of 30 graduate students who analyzed a complex socioenvironmental problem. Our analysis shows that these methods provide inconsistent results and often fall short of capturing variations in mental models. The analysis points to a lack of effective methods for examining such maps and helps articulate a major research problem for systems‐thinking scholars. © 2023 System Dynamics Society.
more » « less
The Lake Urmia vignette: a tool to assess understanding of complexity in socio‐environmental systems

https://doi.org/10.1002/sdr.1659

Davis, Kirsten; Ghaffarzadegan, Navid; Grohs, Jacob; Grote, Dustin; Hosseinichimeh, Niyousha; Knight, David; Mahmoudi, Hesam; Triantis, Konstantinos (August 2020, System Dynamics Review)

Abstract We introduce the Lake Urmia Vignette (LUV) as a tool to assess individuals' understanding of complexity in socio‐environmental systems. LUV is based on a real‐world case and includes a short vignette describing an environmental catastrophe involving a lake. Over a few decades, significant issues have manifested themselves at the lake because of various social, political, economic, and environmental factors. We design a rubric for assessing responses to a prompt. A pilot test with a sample of 30 engineering graduate students is conducted. We compare responses to LUV with other measures. Our findings suggest that students' understanding of complexity is positively associated with their understanding of systems concepts such as feedback loops but not with other possible variables such as self‐reported systems thinking skills or systems‐related coursework. Based on the provided instructions, researchers can use LUV as a novel assessment tool to examine understanding of complexity in socio‐environmental systems. © 2020 System Dynamics Society
more » « less

Search for: All records